Parameter Learning for a Readability Checking Tool
نویسندگان
چکیده
This paper describes the application of machine learning methods to determine parameters for DeLite, a readability checking tool. DeLite pinpoints text segments that are difficult to understand and computes for a given text a global readability score, which is a weighted sum of normalized indicator values. Indicator values are numeric properties derived from linguistic units in the text, such as the distance between a verb and its complements or the number of possible antecedents for a pronoun. Indicators are normalized by means of a derivation of the Fermi function with two parameters. DeLite requires individual parameters for this normalization function and a weight for each indicator to compute the global readability score. Several experiments to determine these parameters were conducted, using different machine learning approaches. The training data consists of more than 300 user ratings of texts from the municipality domain. The weights for the indicators are learned using two approaches: i) robust regression with linear optimization and ii) an approximative iterative linear regression algorithm. For evaluation, the computed readability scores are compared to user ratings. The evaluation showed that iterative linear regression yields a smaller square error than robust regression although this method is only approximative. Both methods yield results outperforming a first manual setting, and for both methods, basically the same set of non-zero weights remain.
منابع مشابه
A Readability Checker with Supervised Learning Using Deep Indicators
Checking for readability or simplicity of texts is important for many institutional and individual users. Formulas for approximately measuring text readability have a long tradition. Usually, they exploit surfaceoriented indicators like sentence length, word length, word frequency, etc. However, in many cases, this information is not adequate to realistically approximate the cognitive difficult...
متن کاملA Readability Checker with Supervised Learning using Deep Syntactic and Semantic Indicators
Checking for readability or simplicity of texts is important for many institutional and individual users. Formulas for approximately measuring text readability have a long tradition. Usually, they exploit surface-oriented indicators like sentence length, word length, word frequency, etc. However, in many cases, this information is not adequate to realistically approximate the cognitive difficul...
متن کاملReadability Assessment for Text Simplification
We describe a readability assessment approach to support the process of text simplification for poor literacy readers. Given an input text, the goal is to predict its readability level, which corresponds to the literacy level that is expected from the target reader: rudimentary, basic or advanced. We complement features traditionally used for readability assessment with a number of new features...
متن کاملUsing the crowd for readability prediction
Inspired by previous work on crowdsourcing we investigate two different methodologies to assess the readability of a wide variety of text material by implementing two assessment tools. A lightweight crowdsourcing tool which invites users to provide pairwise comparisons and a more advanced version where experts can rank a batch of texts based on readability. In order to validate this approach, r...
متن کاملMachine Learning Methods in Statistical Model Checking and System Design - Tutorial
Recent research has seen an increasingly fertile convergence of ideas from machine learning and formal modelling. Here we review some recently introduced methodologies for model checking and system design/ parameter synthesis for logical properties against stochastic dynamical models. The crucial insight is a regularity result which states that the satisfaction probability of a logical formula ...
متن کامل